Unsupervised Bayesian Part of Speech Inference with Particle Gibbs

نویسندگان

  • Gregory Dubbin
  • Phil Blunsom
چکیده

As linguistic models incorporate more subtle nuances of language and its structure, standard inference techniques can fall behind. These models are often tightly coupled such that they defy clever dynamic programming tricks. Here we demonstrate that Sequential Monte Carlo approaches, i.e. particle filters, are well suited to approximating such models. We implement two particle filters, which jointly sample either sentences or word types, and incorporate them into a Particle Gibbs sampler for Bayesian inference of syntactic part-of-speech categories. We analyze the behavior of the samplers and compare them to an exact block sentence sampler, a local sampler, and an existing heuristic word type sampler. We also explore the benefits of mixing Particle Gibbs and standard samplers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Part of Speech Inference with Particle Filters

As linguistic models incorporate more subtle nuances of language and its structure, standard inference techniques can fall behind. Often, such models are tightly coupled such that they defy clever dynamic programming tricks. However, Sequential Monte Carlo (SMC) approaches, i.e. particle filters, are well suited to approximating such models, resolving their multi-modal nature at the cost of gen...

متن کامل

Models and Training for Unsupervised Preposition Sense Disambiguation

We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-freque...

متن کامل

Unsupervised Learning of Lexical Information for Language Processing Systems

Natural language processing systems such as speech recognition and machine translation conventionally treat words as their fundamental unit of processing. However, in many cases the definition of a “word” is not obvious, such as in languages without explicit white space delimiters, in agglutinative languages, or in streams of continuous speech. This thesis attempts to answer the question of whi...

متن کامل

Bayesian Inference of (Co) Variance Components and Genetic Parameters for Economic Traits in Iranian Holsteins via Gibbs Sampling

The aim of this study was using Bayesian approach via Gibbs sampling (GS) for estimating genetic parameters of production, reproduction and health traits in Iranian Holstein cows. Data consisted of 320666 first- lactation records of Holstein cows from 7696 sires and 260302 dams collected by the animal breeding center of Iran from year 1991 to 2010. (Co) variance components were estimated using ...

متن کامل

Bayesian Inference for Finite-State Transducers

We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finite-state transducers on end-toend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of this framework, and, where applicable, large improvements in performance over EM. We also show, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012